{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Similarity search with Langchain and Open AI\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/integrations/langchain/langchain-vector-store.ipynb)\n",
    "\n",
    "\n",
    "This workbook shows example of similiarity search for a search query and demonstrates filtering of metadata. First  we   split the documents into chunks using `langchain` and then index into elasticsearch through [`ElasticsearchStore.from_documents`](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install packages and import modules\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install packages\n",
    "!python3 -m pip install -qU langchain langchain-elasticsearch openai tiktoken\n",
    "\n",
    "# import modules\n",
    "from getpass import getpass\n",
    "from langchain_elasticsearch import ElasticsearchStore\n",
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from urllib.request import urlopen\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "import json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect to Elasticsearch\n",
    "\n",
    "ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial. \n",
    "\n",
    "We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n",
    "\n",
    "\n",
    "We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment. This would help create and index data easily. In the ElasticsearchStore instance, will set `embedding` to [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.openai.OpenAIEmbeddings.html) to embed the texts and also set the elasticsearch index name that will be used in this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n",
    "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
    "\n",
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
    "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n",
    "\n",
    "# https://platform.openai.com/api-keys\n",
    "OPENAI_API_KEY = getpass(\"OpenAI API key: \")\n",
    "\n",
    "embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)\n",
    "\n",
    "vector_store = ElasticsearchStore(\n",
    "    es_cloud_id=ELASTIC_CLOUD_ID,\n",
    "    es_api_key=ELASTIC_API_KEY,\n",
    "    index_name=\"workplace_index\",\n",
    "    embedding=embeddings,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download the dataset \n",
    "\n",
    "Let's download the sample dataset and deserialize the document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/example-apps/chatbot-rag-app/data/data.json\"\n",
    "\n",
    "response = urlopen(url)\n",
    "\n",
    "workplace_docs = json.loads(response.read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Split Documents into Passages\n",
    "\n",
    "\n",
    "We will chunk these documents into 800 token passages with an overlap of 0 tokens using a simple splitter. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metadata = []\n",
    "content = []\n",
    "\n",
    "for doc in workplace_docs:\n",
    "    content.append(doc[\"content\"])\n",
    "    metadata.append(\n",
    "        {\n",
    "            \"name\": doc[\"name\"],\n",
    "            \"summary\": doc[\"summary\"],\n",
    "            \"rolePermissions\": doc[\"rolePermissions\"],\n",
    "        }\n",
    "    )\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
    "    chunk_size=512, chunk_overlap=256\n",
    ")\n",
    "docs = text_splitter.create_documents(content, metadatas=metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Index data into elasticsearch\n",
    "\n",
    "Next we will index data to elasticsearch using [ElasticsearchStore.from_documents](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents). We will use Cloud ID,  Password and Index name values set in the `Create cloud deployment` step.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = vector_store.from_documents(\n",
    "    docs,\n",
    "    embeddings,\n",
    "    es_cloud_id=ELASTIC_CLOUD_ID,\n",
    "    es_api_key=ELASTIC_API_KEY,\n",
    "    index_name=\"workplace_index\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Results functions\n",
    "Next, we will create a small function to show the results of our query in human-readable outputs. This function would be used in our examples to display the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "metadata": {},
   "outputs": [],
   "source": [
    "def showResults(output):\n",
    "    print(\"Total results: \", len(output))\n",
    "    for index in range(len(output)):\n",
    "        print(output[index])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying the dataset with similarity_search\n",
    "\n",
    "Now that we have indexed our sample data to elasticsearch, we will perform a similarity search on query - `How does the compensation work?`. By default returns top `4` documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total results:  4\n",
      "page_content='Compensation Bands:\\nBased on the job levels, the following compensation bands have been established:\\na. Entry-Level Band: This band encompasses salary ranges for employees in entry-level positions. It aims to provide competitive compensation for individuals starting their careers within the company.\\n\\nb. Intermediate-Level Band: This band covers salary ranges for employees who have gained moderate experience and expertise in their respective roles. It rewards employees for their growing skill set and contributions.\\n\\nc. Senior-Level Band: The senior-level band includes salary ranges for experienced employees who have attained advanced skills and have a proven track record of delivering results. It reflects the increased responsibilities and expectations placed upon these individuals.' metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content='Performance-Based Compensation:\\nIn addition to the defined compensation bands, we emphasize a performance-based compensation model. Performance evaluations will be conducted regularly, and employees exceeding performance expectations will be eligible for bonuses, incentives, and salary increases. This approach rewards high achievers and motivates employees to excel in their roles.\\n\\nConclusion:\\nBy implementing this compensation bands strategy, our IT company aims to establish fair and competitive compensation practices that align with market standards and foster employee satisfaction. Regular evaluations and market benchmarking will enable us to adapt and refine the strategy to meet the evolving needs of our organization.' metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content=\"Introduction:\\nThis document outlines the compensation bands strategy for the various teams within our IT company. The goal is to establish a fair and competitive compensation structure that aligns with industry standards, rewards performance, and attracts top talent. By implementing this strategy, we aim to foster employee satisfaction and retention while ensuring the company's overall success.\\n\\nPurpose:\\nThe purpose of this compensation bands strategy is to:\\na. Define clear guidelines for salary ranges based on job levels and market benchmarks.\\nb. Support equitable compensation practices across different teams.\\nc. Encourage employee growth and performance.\\nd. Enable effective budgeting and resource allocation.\" metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content='Job Levels:\\nTo establish a comprehensive compensation structure, we have defined distinct job levels within each team. These levels reflect varying degrees of skills, experience, and responsibilities. The levels include:\\na. Entry-Level: Employees with limited experience or early career professionals.\\nb. Intermediate-Level: Employees with moderate experience and demonstrated competence.\\nc. Senior-Level: Experienced employees with advanced skills and leadership capabilities.\\nd. Leadership-Level: Managers and team leaders responsible for strategic decision-making.' metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n"
     ]
    }
   ],
   "source": [
    "query = \"How does the compensation work?\"\n",
    "results = documents.similarity_search(query)\n",
    "\n",
    "showResults(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying the dataset show top 10 documents\n",
    "\n",
    "Now we will set `k=10` and try same query to see top 10 documents.  \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total results:  10\n",
      "page_content='Compensation Bands:\\nBased on the job levels, the following compensation bands have been established:\\na. Entry-Level Band: This band encompasses salary ranges for employees in entry-level positions. It aims to provide competitive compensation for individuals starting their careers within the company.\\n\\nb. Intermediate-Level Band: This band covers salary ranges for employees who have gained moderate experience and expertise in their respective roles. It rewards employees for their growing skill set and contributions.\\n\\nc. Senior-Level Band: The senior-level band includes salary ranges for experienced employees who have attained advanced skills and have a proven track record of delivering results. It reflects the increased responsibilities and expectations placed upon these individuals.' metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content='Performance-Based Compensation:\\nIn addition to the defined compensation bands, we emphasize a performance-based compensation model. Performance evaluations will be conducted regularly, and employees exceeding performance expectations will be eligible for bonuses, incentives, and salary increases. This approach rewards high achievers and motivates employees to excel in their roles.\\n\\nConclusion:\\nBy implementing this compensation bands strategy, our IT company aims to establish fair and competitive compensation practices that align with market standards and foster employee satisfaction. Regular evaluations and market benchmarking will enable us to adapt and refine the strategy to meet the evolving needs of our organization.' metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content=\"Introduction:\\nThis document outlines the compensation bands strategy for the various teams within our IT company. The goal is to establish a fair and competitive compensation structure that aligns with industry standards, rewards performance, and attracts top talent. By implementing this strategy, we aim to foster employee satisfaction and retention while ensuring the company's overall success.\\n\\nPurpose:\\nThe purpose of this compensation bands strategy is to:\\na. Define clear guidelines for salary ranges based on job levels and market benchmarks.\\nb. Support equitable compensation practices across different teams.\\nc. Encourage employee growth and performance.\\nd. Enable effective budgeting and resource allocation.\" metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content='Job Levels:\\nTo establish a comprehensive compensation structure, we have defined distinct job levels within each team. These levels reflect varying degrees of skills, experience, and responsibilities. The levels include:\\na. Entry-Level: Employees with limited experience or early career professionals.\\nb. Intermediate-Level: Employees with moderate experience and demonstrated competence.\\nc. Senior-Level: Experienced employees with advanced skills and leadership capabilities.\\nd. Leadership-Level: Managers and team leaders responsible for strategic decision-making.' metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content=\"d. Leadership-Level Band: This band comprises salary ranges for managers and team leaders responsible for guiding and overseeing their respective teams. It considers their leadership abilities, strategic thinking, and the impact they have on the company's success.\\n\\nMarket Benchmarking:\\nTo ensure our compensation remains competitive, regular market benchmarking will be conducted. This involves analyzing industry salary trends, regional compensation data, and market demand for specific roles. The findings will inform periodic adjustments to our compensation bands to maintain alignment with the market.\" metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content=\"If an employee's employment is terminated, they will be paid out for any unused vacation time, calculated based on their current rate of pay.\\nPolicy Review and Updates\\n\\nThis vacation policy will be reviewed periodically and updated as necessary, taking into account changes in labor laws, business needs, and employee feedback.\\nQuestions and Concerns\\n\\nEmployees are encouraged to direct any questions or concerns about this policy to their supervisor or the HR department.\" metadata={'name': 'Company Vacation Policy', 'summary': ': This policy outlines the guidelines and procedures for requesting and taking time off from work for personal and leisure purposes. Full-time employees accrue vacation time at a rate of [X hours] per month, equivalent to [Y days] per year. Vacation requests must be submitted to supervisors at least', 'rolePermissions': ['demo', 'manager']}\n",
      "page_content=\"Employees are encouraged to plan their vacations around the company's peak and non-peak periods to minimize disruptions. Vacation requests during peak periods may be subject to limitations and require additional advance notice.\\nVacation Pay\\n\\nEmployees will receive their regular pay during their approved vacation time. Vacation pay will be calculated based on the employee's average earnings over the [B weeks] preceding their vacation.\\nUnplanned Absences and Vacation Time\\n\\nIn the event of an unplanned absence due to illness or personal emergencies, employees may use their accrued vacation time, subject to supervisor approval. Employees must inform their supervisor as soon as possible and provide any required documentation upon their return to work.\\nVacation Time and Termination of Employment\" metadata={'name': 'Company Vacation Policy', 'summary': ': This policy outlines the guidelines and procedures for requesting and taking time off from work for personal and leisure purposes. Full-time employees accrue vacation time at a rate of [X hours] per month, equivalent to [Y days] per year. Vacation requests must be submitted to supervisors at least', 'rolePermissions': ['demo', 'manager']}\n",
      "page_content='As we continue to prioritize the well-being of our employees, we are making a slight adjustment to our hybrid work policy. Starting May 1, 2023, employees will be required to work from the office three days a week, with two days designated for remote work. Please communicate with your supervisor and HR department to establish your updated in-office workdays.' metadata={'name': 'Wfh Policy Update May 2023', 'summary': 'Starting May 1, 2023, our hybrid work policy will require employees to work from the office three days a week and two days remotely.', 'rolePermissions': ['demo', 'manager']}\n",
      "page_content='Starting May 2022, the company will be implementing a two-day in-office work requirement per week for all eligible employees. Please coordinate with your supervisor and HR department to schedule your in-office workdays while continuing to follow all safety protocols.' metadata={'name': 'April Work From Home Update', 'summary': 'Starting May 2022, employees will need to work two days a week in the office. Coordinate with your supervisor and HR department for these days while following safety protocols.', 'rolePermissions': ['demo', 'manager']}\n",
      "page_content=\"Step 4: Complete the form\\nFill out the form by entering your personal information, such as your name, Social Insurance Number (SIN), and address. Then, go through each section to claim any personal tax credits that apply to you. These credits may include:\\nBasic personal amount\\nAmount for an eligible dependant\\nAmount for infirm dependants age 18 or older\\nCaregiver amount\\nDisability amount\\nTuition and education amounts\\n\\nRead the instructions carefully for each section to ensure you claim the correct amounts.\\n\\nStep 5: Sign and date the form\\nOnce you've completed the form, sign and date it at the bottom.\" metadata={'name': 'Updating Your Tax Elections Forms', 'summary': ': This guide gives a step-by-step explanation of how to update your TD1 Personal Tax Credits Return form. Access the form from the CRA website and choose the correct version based on your province or territory of residence. Download and open the form in Adobe Reader, fill out the form by entering', 'rolePermissions': ['demo', 'manager']}\n"
     ]
    }
   ],
   "source": [
    "query = \"How does the compensation work?\"\n",
    "results = documents.similarity_search(query, k=10)\n",
    "\n",
    "showResults(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying the dataset with filtering Metadata\n",
    "We will now add metadata filtering by Keyword at query time, to match `rolePermissions` as `manager`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total results:  4\n",
      "page_content='Compensation Bands:\\nBased on the job levels, the following compensation bands have been established:\\na. Entry-Level Band: This band encompasses salary ranges for employees in entry-level positions. It aims to provide competitive compensation for individuals starting their careers within the company.\\n\\nb. Intermediate-Level Band: This band covers salary ranges for employees who have gained moderate experience and expertise in their respective roles. It rewards employees for their growing skill set and contributions.\\n\\nc. Senior-Level Band: The senior-level band includes salary ranges for experienced employees who have attained advanced skills and have a proven track record of delivering results. It reflects the increased responsibilities and expectations placed upon these individuals.' metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content='Performance-Based Compensation:\\nIn addition to the defined compensation bands, we emphasize a performance-based compensation model. Performance evaluations will be conducted regularly, and employees exceeding performance expectations will be eligible for bonuses, incentives, and salary increases. This approach rewards high achievers and motivates employees to excel in their roles.\\n\\nConclusion:\\nBy implementing this compensation bands strategy, our IT company aims to establish fair and competitive compensation practices that align with market standards and foster employee satisfaction. Regular evaluations and market benchmarking will enable us to adapt and refine the strategy to meet the evolving needs of our organization.' metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content=\"Introduction:\\nThis document outlines the compensation bands strategy for the various teams within our IT company. The goal is to establish a fair and competitive compensation structure that aligns with industry standards, rewards performance, and attracts top talent. By implementing this strategy, we aim to foster employee satisfaction and retention while ensuring the company's overall success.\\n\\nPurpose:\\nThe purpose of this compensation bands strategy is to:\\na. Define clear guidelines for salary ranges based on job levels and market benchmarks.\\nb. Support equitable compensation practices across different teams.\\nc. Encourage employee growth and performance.\\nd. Enable effective budgeting and resource allocation.\" metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n",
      "page_content='Job Levels:\\nTo establish a comprehensive compensation structure, we have defined distinct job levels within each team. These levels reflect varying degrees of skills, experience, and responsibilities. The levels include:\\na. Entry-Level: Employees with limited experience or early career professionals.\\nb. Intermediate-Level: Employees with moderate experience and demonstrated competence.\\nc. Senior-Level: Experienced employees with advanced skills and leadership capabilities.\\nd. Leadership-Level: Managers and team leaders responsible for strategic decision-making.' metadata={'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'rolePermissions': ['manager']}\n"
     ]
    }
   ],
   "source": [
    "query = \"How does the compensation work?\"\n",
    "results = documents.similarity_search(\n",
    "    query, filter=[{\"match\": {\"metadata.rolePermissions\": \"manager\"}}]\n",
    ")\n",
    "\n",
    "showResults(results)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.11.4 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
